Document Retrieval using Hierarchical Agglomerative Clustering with Multi-view point Similarity Measure Based on Correlation: Performance Analysis

نویسندگان

  • J. Sankari
  • R. Manavalan
  • K. S. Rangasamy
چکیده

Clustering is one of the most interesting and important tool for research in data mining and other disciplines. The aim of clustering is to find the relationship among the data objects, and classify them into meaningful subgroups. The effectiveness of clustering algorithms depends on the appropriateness of the similarity measure between the data in which the similarity can be computed. This paper focus on performance analysis of Agglomerative clustering with Multi Viewpoint based on Cosine similarity and Correlation similarities for finding the relationship between different documents and clustering them. The experiment is conducted over fifteen text documents and the performance of the proposed method is analyzed thoroughly and compared to Hierarchical Agglomerative clustering with Multi Viewpoint that is based on cosine similarity. The experimental results clearly shows that the proposed model Hierarchical Agglomerative clustering with Multi Viewpoint, based on correlation similarity perform quite well for document retrieval. Keywords-Hierarchical Agglomerative Clustering, Document retrieval, Multi Viewpoint similarity measure, cosine similarity, correlation similarity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Approach for Document Clustering using Agglomerative Clustering and Hebbian-type Neural Networkx

Clustering is a useful method that categorizes a large quantity of unordered text documents into a small number of meaningful and coherent collections, thereby providing a basis for instinctive and informative navigation and browsing mechanisms. Different type of distance functions and similarity measures have been used for clustering, such as squared, cosine similarity, Euclidean distance and ...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Hierarchical Divisive Clustering with Multi View-Point Based Similarity Measure

All clustering methods have to assume some cluster relationship among the data objects that they are applied on. Similarity between a pair of objects can be defined either explicitly or implicitly. In this paper, we introduce a novel multi-viewpoint based similarity measure and two related clustering methods. The major difference between a traditional dissimilarity/similarity measure and ours i...

متن کامل

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

A Relative Approach to Hierarchical Clustering

This paper presents a new approach to agglomerative hierarchical clustering. Classical hierarchical clustering algorithms are based on metrics which only consider the absolute distance between two clusters, merging the pair of clusters with highest absolute similarity. We propose a relative dissimilarity measure, which considers not only the distance between a pair of clusters, but also how dis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013